Use newer version of mma_atom and copy_atom in 00_bmg_gemm #540

anamikac-intel · 2025-09-29T06:16:37Z

Modify 00_bmg_gemm to include new mma and copy atoms (#477).
00_bmg_gemm combines two parts: mma and epilogue. To add new atom changes, we need to update both parts since they currently use old atoms. As starting we will:

Keep CollectiveEpilogue unchanged for now
Only modify CollectiveMma first

Old Atom:

Problem Size: 5120x4096x4096x1
Cutlass GEMM Performance: [96.448]TFlop/s (1.7813)ms

New Atom:

Problem Size: 5120x4096x4096x1
Cutlass GEMM Performance: [97.259]TFlop/s (1.7664)ms

include/cutlass/gemm/collective/xe_mma.hpp

examples/00_bmg_gemm/00_bmg_gemm.cpp

examples/00_bmg_gemm/CMakeLists.txt

include/cutlass/gemm/kernel/gemm_universal_decl.h

include/cutlass/gemm/kernel/xe_gemm.hpp

examples/00_bmg_gemm/00_bmg_gemm.cpp

include/cutlass/gemm/collective/xe_mma.hpp

examples/00_bmg_gemm/00_bmg_gemm.cpp

include/cutlass/gemm/collective/xe_mma.hpp

examples/00_bmg_gemm/00_bmg_gemm.cpp

include/cutlass/gemm/collective/collective_mma.hpp

…tions

…d_copy_*, and move tensor/copy initialization to host-side params in to_underlying_arguments

examples/00_bmg_gemm/00_bmg_gemm.cpp

…unctions

examples/00_bmg_gemm/00_bmg_gemm.cpp

include/cute/atom/copy_traits_xe_2d.hpp

include/cutlass/gemm/collective/xe_mma.hpp

petercad

Approving with the minor changes suggested above.

Edit -- there is a bug in the TiledCopy handling that needs fixing, described below.

sanchitintel · 2025-10-15T20:57:55Z

Hi @anamikac-intel, with this PR, I'm encountering the same errors locally as the CI.
Are you using a more recent igc version at your end?
I'm using https://github.com/intel/intel-graphics-compiler/releases/tag/v2.18.5.

Thanks!

petercad · 2025-10-15T22:02:36Z

With New Atom perf increase by 2x

Theoretical bf16 peak perf for BMG is 116 TF/s, so the new performance is too high. Either there's a problem in the kernel (not doing the full computation) or something's wrong with the performance computation.

tdeng5 · 2025-10-16T02:26:56Z

we checked some shapes' performance:

include/cutlass/gemm/collective/xe_mma.hpp

examples/00_bmg_gemm/00_bmg_gemm.cpp

include/cutlass/gemm/collective/xe_mma.hpp

Fixes a compilation failure found in #540 when >2D tensors are passed to one of the `make_block_2d_copy_*` functions.

sanchitintel · 2025-10-18T23:53:18Z

include/cutlass/gemm/collective/xe_mma.hpp

  using ArchTag = typename DispatchPolicy::ArchTag;

-  static_assert(platform::is_same<ElementA, ElementB>::value, "MainloopIntelXeXMX16 requires that A and B have same type.");
+  static_assert(platform::is_same<ElementA, ElementB>::value, "MainloopXeL1Staged requires that A and B have same type.");


In the existing MMA collective code, we use variable names ATOM_M, ATOM_N, ATOM_K incorrectly, because they don't correspond to the underlying MMA atom, but to our tiling scheme instead.

static constexpr int ATOM_M = get<1>(typename TiledMma::ThrLayoutVMNK{}.shape()); static constexpr int ATOM_N = get<2>(typename TiledMma::ThrLayoutVMNK{}.shape()); static constexpr int ATOM_K = get<3>(typename TiledMma::ThrLayoutVMNK{}.shape());

Workgroup tiles are divided spatially into sub-group fragments/tiles.

For example, the variable ATOM_M is actually the number of partitions of WG_M in subgroup tiles that comprise a workgroup tile. i.e. The variable ATOM_M means WG_M/SG_M, and is not representative of the atom's M dimension.

Can we rename these variables in this PR? It's not necessary for correctness, but just for understanding the code.

Thanks!

I agree with you, but we should fix it in another PR because our new feature in the latest release strongly depend on this PR, we expect this PR to be merge ASAP.

…ck 2D Copy Utilities

anamikac-intel · 2025-10-19T11:29:01Z

Performance results: new vs legacy implementation on different problem sizes (Tested on IGC 2.20)

Anamika Chatterjee added 2 commits September 29, 2025 08:10

Test commit

83f1de8

Enable new mma and copy atoms

0b184f0

anamikac-intel marked this pull request as ready for review September 29, 2025 08:11

anamikac-intel changed the title ~~Use newer version on mma_atom and copy_atom in 00_bmg_gemm~~ Use newer version of mma_atom and copy_atom in 00_bmg_gemm Sep 30, 2025

Anamika Chatterjee added 2 commits September 30, 2025 15:39

adding legacy code back for collectivemma and gemmuniversal

ef1bafa

delete unwanted file

f210ba3

petercad reviewed Sep 30, 2025

View reviewed changes

include/cutlass/gemm/collective/xe_mma.hpp Outdated Show resolved Hide resolved

petercad reviewed Sep 30, 2025

View reviewed changes

include/cutlass/gemm/collective/xe_mma.hpp Outdated Show resolved Hide resolved

petercad reviewed Sep 30, 2025

View reviewed changes

examples/00_bmg_gemm/00_bmg_gemm.cpp Outdated Show resolved Hide resolved

petercad reviewed Sep 30, 2025

View reviewed changes

examples/00_bmg_gemm/CMakeLists.txt Outdated Show resolved Hide resolved

petercad reviewed Sep 30, 2025

View reviewed changes

include/cutlass/gemm/kernel/gemm_universal_decl.h Outdated Show resolved Hide resolved

petercad reviewed Sep 30, 2025

View reviewed changes

include/cutlass/gemm/kernel/xe_gemm.hpp Outdated Show resolved Hide resolved

Anamika Chatterjee added 2 commits October 1, 2025 12:22

Changes added based on feedback

5f5a8b7

Remove xe_gemm_legacy as its not longer used

c55ac28

rolandschulz reviewed Oct 1, 2025

View reviewed changes

examples/00_bmg_gemm/00_bmg_gemm.cpp Show resolved Hide resolved

petercad reviewed Oct 2, 2025

View reviewed changes

include/cutlass/gemm/collective/xe_mma.hpp Outdated Show resolved Hide resolved

petercad reviewed Oct 2, 2025

View reviewed changes

examples/00_bmg_gemm/00_bmg_gemm.cpp Outdated Show resolved Hide resolved

Changes added based on feedback

946b46c

petercad reviewed Oct 3, 2025

View reviewed changes

include/cutlass/gemm/collective/xe_mma.hpp Outdated Show resolved Hide resolved

kausikmaiti reviewed Oct 4, 2025

View reviewed changes

Anamika Chatterjee and others added 5 commits October 4, 2025 17:52

Applied review comments

c97f011

Add compile-time checks to enforce new XE copy atoms in block 2D func…

9691e60

…tions

Modified static assert message

93b076a

Modified static assert message

a6f068c

Merge branch 'intel:main' into anamikac/add-newatoms

fcbfecf

petercad mentioned this pull request Oct 7, 2025

[CuTe] [Xe] Fix make_block_2d_copy_* for batched tensors #549

Merged

Move legacy example to legacy folder, pass 2D strides to make_block_2…

e1e64f7

…d_copy_*, and move tensor/copy initialization to host-side params in to_underlying_arguments

rolandschulz reviewed Oct 9, 2025

View reviewed changes

examples/00_bmg_gemm/00_bmg_gemm.cpp Outdated Show resolved Hide resolved

anamikac-intel mentioned this pull request Oct 10, 2025

Use newer version of mma_atom and copy_atom in CollectiveEpilogue for 00_bmg_gemm test #553

Open

Applied reviwer comment

ea67069

Anamika Chatterjee and others added 4 commits October 10, 2025 11:37

This is an empty commit

e9878b9

Preventing exceptions on older IGC versions

fbb7bb5

Remove unwanted returns from device-side params

4fb70c0

Modify compile-time checks to enforce new XE copy atoms in block 2D f…

4fd4376

…unctions

petercad reviewed Oct 15, 2025

View reviewed changes

examples/00_bmg_gemm/00_bmg_gemm.cpp Outdated Show resolved Hide resolved

petercad reviewed Oct 15, 2025

View reviewed changes

examples/00_bmg_gemm/00_bmg_gemm.cpp Outdated Show resolved Hide resolved

petercad reviewed Oct 15, 2025

View reviewed changes

include/cute/atom/copy_traits_xe_2d.hpp Outdated Show resolved Hide resolved

petercad reviewed Oct 15, 2025

View reviewed changes

include/cute/atom/copy_traits_xe_2d.hpp Outdated Show resolved Hide resolved

petercad reviewed Oct 15, 2025

View reviewed changes

include/cutlass/gemm/collective/xe_mma.hpp Outdated Show resolved Hide resolved

petercad approved these changes Oct 15, 2025

View reviewed changes

This comment was marked as outdated.

Sign in to view

sanchitintel reviewed Oct 16, 2025

View reviewed changes

include/cutlass/gemm/collective/xe_mma.hpp Outdated Show resolved Hide resolved

sanchitintel reviewed Oct 16, 2025

View reviewed changes

examples/00_bmg_gemm/00_bmg_gemm.cpp Show resolved Hide resolved

petercad reviewed Oct 16, 2025

View reviewed changes

include/cutlass/gemm/collective/xe_mma.hpp Outdated Show resolved Hide resolved

petercad reviewed Oct 16, 2025

View reviewed changes

include/cutlass/gemm/collective/xe_mma.hpp Outdated Show resolved Hide resolved

tdeng5 pushed a commit that referenced this pull request Oct 17, 2025

[CuTe] [Xe] Fix make_block_2d_copy_* for batched tensors (#549)

35e80e1

Fixes a compilation failure found in #540 when >2D tensors are passed to one of the `make_block_2d_copy_*` functions.

Applied review comments

ca503bf

Antonyvance added the urgent PR requires a urgent attention (for release or blocking another PR) label Oct 17, 2025

Antonyvance added this to the 0.6 milestone Oct 17, 2025

sanchitintel reviewed Oct 18, 2025

View reviewed changes

Add batch_idx to global tensor passed to make_block_2d_copy_* and Blo…

4eb3bf3

…ck 2D Copy Utilities

Use newer version of mma_atom and copy_atom in 00_bmg_gemm #540

Are you sure you want to change the base?

Use newer version of mma_atom and copy_atom in 00_bmg_gemm #540

Conversation

anamikac-intel commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

petercad left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanchitintel commented Oct 15, 2025

Uh oh!

petercad commented Oct 15, 2025

Uh oh!

This comment was marked as outdated.

tdeng5 commented Oct 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sanchitintel Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

taozha2 Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

anamikac-intel commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

anamikac-intel commented Sep 29, 2025 •

edited

Loading

petercad left a comment •

edited

Loading

sanchitintel Oct 18, 2025 •

edited

Loading

anamikac-intel commented Oct 19, 2025 •

edited

Loading